Paste plugin ehnacements
Todd Northrop ("Speednet")
June 29, 2009


SUMMARY: 

Paste as Text has been changed from a dialog to a toggle button on the 
toolbar. When it's "on", all pastes done by the user are done as plain 
text. When it's "off" paste happens normally. 

The Paste as Text button can be "sticky" or not. The default is "not 
sticky", meaning that it reverts to normal paste after the next paste 
initiated by the user. 

The first time the Paste as Text button is clicked, the user is shown a 
message indicating that they are in plain text mode, and it explains how 
to return to normal paste mode. The plugin can optionally be configured 
to show that message every time it is clicked, but the default is to 
only show the message once. 

An expando property on the editor (pasteAsPlainText) indicates if the 
paste is in plain text mode. 

A new private method, _insertPlainText(), handles the translation and 
insertion of plain text. The new plain text functionality is hooked into 
the current paste function in the process() function within the init() 
method. Overall, there is very little impact on how a paste takes place 
from the old plugin to the new one. 

The MS Office cleanup functions have been radically changed in order to 
make the existing code much more efficient, and to retain as many of the 
office styles as possible, translating them into valid CSS. One major 
hurlde that remains is properly translating pastes from Excel. 
Currently, such pastes are either coming through with all styles 
removed, or in the case of WebKit browsers, the Excel table is getting 
corrupted. This also happens in the current paste plugin. (See Known 
Issues below.) 

Other significant changes are the addition of several new configuration 
options required for the plain text functionality, and several 
improvements to the overall code base in the plugin. 

The DETAIL section below includes information about specific changes. 


DETAIL:

* Made a number of fixes to JavaScript code, such as:

 - Removed var declaration with same name as inner function. Declaring 
an inner function creates the function name as a local name, with no var 
declaration necessary. 

 - Removed extra semi-colon after each bare function declaration - 
existed in a few places (I got specific instruction/clarification from 
Douglas Crockford on this syntax). 

 - Added "i" flag to RegExps that strip leading and trailing 
non-breaking spaces, because sometimes HTML entities and tags come 
through as upper-case, e.g., "<BR>". 

 - Added "i" flag to RegExps that tests if it's Word content. 

 - Fixed up some RegExps, for example removing some unnecessary escape 
chars - clutter with no benefit, e.g., [^\"]* should be simply [^"]* 


* Improved support/translation of MS Office content:

 - Added support for *all* MS office product tags, not just <v:...> and 
<o:...>. e.g., PowerPoint is <p:...>, Excel is <x:...>, etc. 

 - Got rid of elimination of font and div tags (so these tags will make 
it into the pasted content). This was responsible for stripping out all 
font formatting when pasting Excel content through the Paste as Word 
dialog. 


* Massive simplification of RegExp tests to eliminate bad tags, which 
should be a nice performance boost + be easier to read/maintain. It is 
faster to have one larger RegExp in this case (rather than a bunch of 
smaller RegExps in the process() loop) because the RegExp begins with a 
static char ("<"), which quickly jumps to the possible matches, and only 
does so one time. The old method looped through all possible matches 
multiple times, redundantly. (i.e., don't think it's slower because it's 
longer -- the opposite is true.) This is starting at line 290 in the new 
code. 


* Fixed HTML comment elimination to handle cases where dashes are not 
present (which is valid HTML). Comments are not always <!-- ... -->, 
they can be as simple as <! ... >. Comments are removed in line 292. 


* Broke out the test for tag attributes into its own loop, because 
existing code has the potential to delete valid text that is not inside 
a tag. e.g., the text "type=hello" would be deleted from MS Office 
content, even if it is not inside a tag. New code ensures that attrib is 
inside tag, and handles multiple attributes by continuing to refine 
content until no more changes are necessary. (In most cases it is 
handled by 1 or 2 loops, and RegExp test is very quick because it begins 
with a static character.) Starts at line 303. 


* Fixed class attribute removal code to only remove if it is really 
within a tag. i.e., if the content includes something like 
class=dismissed as part of the regular text, it would be removed by the 
old code. Starts at line 445. 


 * Also in the class attribute removal code, combined the with/without 
quotes test into one call to process(). 


* Added default configuration values at top of code, because the default 
values were getting spread all over the file, creating a maintenance 
problem, and making it hard to know the default value for something. 
Also build a private function called getParam that uses the default 
values automatically. 


* Added new _createDelegate() private method, which wraps a function in 
context, and optionally passes it arguments. This type of method is very 
important for use with callbacks in general, and I could not find a 
similar function anywhere in TinyMCE. You may want to incorporate the 
method in the TinyMCE utility library, and make it accessible to 
everything. It can be an excellent tool for ensuring context during any 
callback, and it can even handle situations where you supply arguments, 
and the code issuing the callback adds even more arguments (such as an 
event callback adding an event object). I fully documented it so you can 
see how it works. 


* New configuration options for paste plugin:

 - paste_convert_headers_to_strong (default=false) - Useful part of old 
paste plugin that never made it into new version. 

 - paste_text_use_dialog (default=false) - Maintains ability to revert 
to legacy functionality. 

 - paste_text_sticky (default=false) - If true, Paste as Text will 
remain activated until the user toggles the button. If false (the 
default), it will revert to regular paste after the next paste by the 
user. 

 - paste_text_notifyalways (default=false) - If true, every time the 
user clicks the Paste as Text button they will see an alert box 
reminding them that they are in plain text mode. If false (the default), 
they will see the alert box one time, and then further reminders will be 
suppressed (saves state in a small cookie value). 

 - paste_text_linebreaktype (default="p") - Sets the way plain text line 
breaks will be encoded. If "p" (the default), regular <p> tags will be 
used to create paragraphs. If "br", two <br /> tags will be used between 
paragraphs. If "none", all line breaks are removed, and all text will 
appear on one line. 

 - paste_text_replacements (default=conversion of ellipses character and 
"smart quotes") - Allows the user to override and/or augment the 
automatic replacements that will occur during a plain text paste. The 
value for this setting is an array of replacements. Each array element 
can be either a regular expression or an array. If en element is a 
regular expression, all matches of that regular expression are removed 
from the content. If an element is an array, the array must have two 
elements: a regular expression, and a replacement expression. The 
replacement expression can be any valid expression that can be used with 
JavaScript's String.replace() function (i.e., can be a string or a 
function). 


* Changes to existing config options:

 - paste_retain_style_properties - New options "all", "*", and "none". 
The default value has been changed to "all", because the plugin can now 
handle Office styles. Specifying "all" or "*" retains all styles except 
for extra MS Office junk, and specifying "none" or an empty string will 
remove all styles. 


* Changes I'm recommending, but have not been done. (If you would like, 
I would be happy to make the modifications that I'm suggesting, but I 
did not want to overstep my bounds without first giving you my line of 
thinking.) 

 - Eliminate the use of the process() inner function in most cases (in 
the _preProcess() method), because it does not really add much, even in 
terms if maintainability, but it does add performance overhead with the 
extra looping. If anything, I would keep a process function as a private 
method shared among all the methods, especially usedful for processing 
"paste_text_replacements". But in normal usage it would be just as clear 
-- and quicker -- to do multiple replacements as: 

		h.replace(/ ... /gi, "")
			.replace(/ ... /gi, "")
			.replace(/ ... /gi, "");
			
 - Eliminate _legacySupport() method, and integrate the functionality 
into the main init() method. Use of the Paste as Word dialog is 
important if a user wishes to maintain styles from Excel, and from other 
sources of rich text. It offers a flexibility and an option that I don't 
think you'll ever want to eliminate completely, so differentiating it as 
"legacy" is not necessary. 


* Known issues:

 - The plain text alert messages are currently hard-coded into the 
plugin (in English). (lines 116 and 119.) They should be translated to 
language resources and put into the appropriate language files. 

 - Opera support is weak. It removes all styles from everything. This 
problem exists in current paste plugin, not something new here. 

 - Opera also does a funky move after pasting -- it scrolls the entire 
page up. No doubt this has something to do with the code that attempts 
to scroll pasted content into view in the editor. (Happens in current 
plugin too.) 

 - WebKit browsers strip the <table> tag from pastes involving a table 
(all Excel pastes, perhaps other types of table). This corrupts the 
paste to the point that the user must edit the HTML manually in order to 
fix it. (Should be addressed.) This problem is in the current plugin 
too. 

 - All browsers strip styles from pastes from Excel. Yet, if you open 
the Paste from Word dialog and paste there (at least using IE), you get 
the styles. So something in the execCommand("paste") is removing styles 
-- maybe because it's pasting into a div and not into an iframe? I 
started fooling with it, but it was becoming too time-consuming, and I 
was probably trying things that you've already tried. Eventually this 
should be fixed somehow. 

 - If you follow a sequence of highlighting text in the editor, then 
click the Paste as Text button, then press Ctrl+V, nothing happens. For 
some reason, when text is highlighted at the time the button is toggle, 
focus is not placed back into the editor as it should. You need to 
highlight the text *after* clicking the Paste as Text button to make it 
work. This does not happen when no text is highlighted -- in that case, 
toggling the button and then immediately pressing Ctrl+V properly pastes 
the text. If you know why this happens when text is highlighted, maybe 
you could insert the code to properly set focus back to the editor after 
the toggle (if that's what needs to happen). 

 - In non-IE browsers, if you paste from Excel (or some kind of table 
from another source), you can end up trapping the cursor in the table. 
IE allows you to get the cursor to a point below the table so you can 
start typing, but other browsers trap the cursor in the table, because 
there is no <p> tag below the table. This may need to be addressed. 
(This is a problem with regular pastes, not with plain text.) 

	
